fix: CustomItemSerializer not applied correctly in queries and SqlParameter [Cosmos]#48702
fix: CustomItemSerializer not applied correctly in queries and SqlParameter [Cosmos]#48702xinlian12 wants to merge 21 commits intoAzure:mainfrom
Conversation
…eline structures and is applied to SqlParameter serialization Fixes Azure#45521 Bug 1: Changed PipelinedDocumentQueryExecutionContext to set DEFAULT_SERIALIZER instead of null on internal request options, preventing getEffectiveItemSerializer() from falling back to the client-level custom serializer for internal pipeline processing. This fixes ORDER BY, GROUP BY, aggregate, DISTINCT, DCount, and hybrid search queries when a custom serializer is configured on CosmosClientBuilder. Bug 2: Added serializer-aware parameter serialization in the query execution path so SqlParameter values are serialized with the effective custom serializer. SqlParameter now stores the raw value and re-serializes it when applySerializer() is called from PipelinedQueryExecutionContextBase.createAsync(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…structor Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes Cosmos DB query behavior when a CustomItemSerializer is configured by preventing it from deserializing internal SDK query structures, and by ensuring query parameters can be serialized with the effective item serializer.
Changes:
- Force internal query pipeline request options to use
CosmosItemSerializer.DEFAULT_SERIALIZER(instead ofnull) to avoid leaking client-level custom serializers into internal deserialization. - Add internal plumbing to re-serialize
SqlParametervalues using the effective serializer at query execution time. - Introduce internal bridge method to apply parameter re-serialization across packages.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/SqlQuerySpec.java | Adds internal method to apply an effective serializer to all parameters. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/SqlParameter.java | Stores raw parameter values and adds internal re-serialization hook. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/ModelBridgeInternal.java | Exposes internal bridge method to apply serializer to query parameters. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/PipelinedQueryExecutionContextBase.java | Applies effective serializer to parameters at query execution time. |
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/query/PipelinedDocumentQueryExecutionContext.java | Forces DEFAULT serializer on internal pipeline request options to prevent custom serializer leaks. |
| .coding-harness/spec.json | Adds harness metadata/spec for the reported issue and intended fix. |
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/SqlParameter.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/azure/cosmos/implementation/query/PipelinedQueryExecutionContextBase.java
Show resolved
Hide resolved
.../src/main/java/com/azure/cosmos/implementation/query/PipelinedQueryExecutionContextBase.java
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/models/ModelBridgeInternal.java
Outdated
Show resolved
Hide resolved
…ndition When a custom serializer is configured, applySerializerToParameters() mutates SqlParameter values in-place on the user's original SqlQuerySpec. If the same SqlQuerySpec is reused across concurrent queries with different serializers, the mutations race without synchronization. Defensive fix: clone the SqlQuerySpec and its SqlParameter list before applying the serializer, so the original object is never mutated by the SDK. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… method, use ImplementationBridgeHelpers accessor - C1: Fix forceSerialization=false to true in SqlParameter.applySerializer() - C4: Remove applySerializerToParameters from ModelBridgeInternal, use SqlQuerySpecHelper accessor pattern via ImplementationBridgeHelpers instead Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…aggregate, DISTINCT, and SqlParameter queries Covers fix for Azure#45521 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ze() The Spark connector and potentially other consumers implement CosmosItemSerializer with only deserialize() and stub serialize() as unimplemented. Guard with try-catch to fall back to default serialization when the custom serializer's serialize() is not available. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@sdkReviewAgent-2 |
|
✅ Review complete (54:56) No new comments — existing review coverage is sufficient. Steps: cross-sdk, history, past-prs, quality, sanity-check, synthesis, test-coverage |
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/CosmosItemSerializerTest.java
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/CosmosItemSerializerTest.java
Outdated
Show resolved
Hide resolved
…ializer in tests Address PR review comments from xinlian12 - use the clientSerializer variable (from getClientBuilder().getCustomItemSerializer()) instead of hardcoding EnvelopWrappingItemSerializer.INSTANCE_NO_TRACKING_ID_VALIDATION in the new query test methods. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - kafka |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
2 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
FabianMeiswinkel
left a comment
There was a problem hiding this comment.
LGTM - but there are still a few test issues
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/CosmosItemSerializerTest.java
Show resolved
Hide resolved
SELECT VALUE COUNT(1) returns a scalar integer, so use Integer.class instead of ObjectNode.class as the result type. ObjectNode.class fails because ValueUnwrapCosmosItemSerializer extracts the _value field and cannot convert the resulting integer to ObjectNode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
eacb7b6 to
20bd377
Compare
Create a BasicCustomItemSerializer that mirrors the real-world use case from issue Azure#45521 — a simple ObjectMapper-based serializer with custom settings (dates as ISO strings via JavaTimeModule) without transforming the document structure. Changes: - Add BasicCustomItemSerializer inner class with custom ObjectMapper - Add it to the Factory data provider so query tests also run with it - Update query tests to use the custom serializer directly when the serializer does not transform document structure (vs falling back to DEFAULT_SERIALIZER for envelope-wrapping) - Use instanceof checks for EnvelopWrappingItemSerializer instead of identity comparison for robustness Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…back to default For queryWithAggregate, queryWithDistinct, and queryWithGroupBy custom serializer tests: skip the test entirely when EnvelopWrappingItemSerializer is used, rather than falling back to DEFAULT_SERIALIZER. These query result shapes (aggregates, projections) are not full documents and cannot be properly deserialized by the envelope-wrapping serializer. Also removed unused isEnvelopeWrapper variable from queryWithOrderBy. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e serializer test
Custom serializers' deserialize(Map<String, Object>, Class<T>) API is designed
for object-to-POJO mapping. SELECT VALUE COUNT(1) returns a scalar that gets
wrapped in a {"_value": N} Document by the aggregate pipeline, which cannot
be deserialized as Integer.class through a custom serializer.
Changed to SELECT COUNT(1) which returns an object {"$1": 3} that properly
goes through the Map-based deserialization path.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - kafka |
|
/azp run java - cosmos - spark |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Adds queryWithSqlParameterDateTimeAndCustomSerializer test that validates the end-to-end scenario: when a custom serializer writes Instant as an ISO-8601 string (not a timestamp), SqlParameter must apply the same serializer so the query filter matches the stored representation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - kafka |
|
/azp run java - cosmos - spark |
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@sdkReviewAgent-2 |
|
Azure Pipelines successfully started running 1 pipeline(s). |
When SqlParameter.applySerializer() calls serialize(Instant), the previous implementation did customMapper.convertValue(item, Map.class) which fails for non-object values (Instant, String, numbers). Now serialize() first converts to JsonNode, checks isObject(), and for scalar values returns a single-entry Map with the well-known primitive value key so the framework correctly sets it as a scalar. This ensures SqlParameter values like Instant are serialized consistently with how the custom serializer stores them in documents (e.g. ISO-8601 strings). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| assertThat(results).isNotNull(); | ||
| assertThat(results).hasSize(1); | ||
| assertThat(results.get(0).id).isEqualTo(id); | ||
| assertThat(results.get(0).createdAt).isEqualTo(createdAt); |
There was a problem hiding this comment.
🔴 Blocking — Correctness: applySerializer is ineffective for scalar SqlParameter values
The CosmosItemSerializer.serialize() contract returns Map<String, Object>, which fundamentally cannot represent scalar values. For custom serializers like BasicCustomItemSerializer, serialize(scalar) calls customMapper.convertValue(scalar, Map.class), which always throws for non-Map-able types (String, Integer, Instant, etc.). The catch(Throwable) block silently swallows the exception, and the parameter retains its default-serialized representation.
Concrete failure trace for this test:
- Document created with
BasicCustomItemSerializer(WRITE_DATES_AS_TIMESTAMPS=false) →createdAtstored as ISO string"2026-03-15T10:30:00Z" SqlParameter("@createdAt", Instant)→setValue()storesrawValue=Instantand default-serializes to epoch number1773837000(SDK's ObjectMapper hasWRITE_DATES_AS_TIMESTAMPS=true)- Clone (line 78):
getValue(Object.class)reads the epoch number from JSON property bag → clone'srawValue = Long(1773837000)(type lost) applySerializer(BasicCustomItemSerializer)→serialize(Long)→convertValue(Long, Map.class)→ throwsIllegalArgumentExceptioncatch(Throwable)fires → parameter keeps epoch number1773837000- Query:
c.createdAt = @createdAtcompares string"2026-03-15T10:30:00Z"vs number1773837000→ 0 results assertThat(results).hasSize(1)→ test fails
This means Bug 2's fix (SqlParameter custom serialization) is a no-op for the exact scenario described in issue #45521 — scalar query parameters with types where the custom serializer changes the wire format.
Root cause: DefaultCosmosItemSerializer.serialize() handles scalars via PrimitiveJsonNodeMap (an internal type), but custom serializers can't — the Map<String, Object> return type makes it impossible to return a scalar from serialize().
Suggested approaches (pick one):
(a) Use the serializer's ObjectMapper directly for scalar parameter values:
void applySerializer(CosmosItemSerializer serializer) {
if (this.rawValue != null && serializer != null) {
ObjectMapper mapper = serializer.getItemObjectMapper();
JsonNode node = mapper.convertValue(this.rawValue, JsonNode.class);
this.jsonSerializable.getPropertyBag().set("value", node);
}
}But this requires BasicCustomItemSerializer (and customer serializers) to call setItemObjectMapper(customMapper) — document this contract.
(b) Add a serializeToNode(T item) method to CosmosItemSerializer with a default implementation using getItemObjectMapper(), and use that in applySerializer instead of serialize().
| List<SqlParameter> originalParams = original.getParameters(); | ||
| if (originalParams != null) { | ||
| for (SqlParameter p : originalParams) { | ||
| clonedParams.add(new SqlParameter(p.getName(), p.getValue(Object.class))); |
There was a problem hiding this comment.
🔴 Blocking — Correctness: Clone via getValue(Object.class) loses the original Java type
This line reconstructs the SqlParameter value by reading from the JSON property bag (getObject("value", Object.class) → getValue(JsonNode)), not from rawValue. The getValue(JsonNode) method converts JSON nodes back to Java primitives: LongNode → Long, DecimalNode → Double, TextNode → String.
For an Instant parameter that was default-serialized as epoch-seconds (a numeric node), getValue(Object.class) returns a Long — the clone's rawValue is Long, not the original Instant. When applySerializer later tries to re-serialize this Long with the custom serializer, it serializes the wrong type.
Fix: Expose rawValue through a package-private accessor and use it for cloning:
// In SqlParameter.java (package-private):
Object getRawValue() { return this.rawValue; }
// Here:
clonedParams.add(new SqlParameter(p.getName(), p.getRawValue()));Or add a package-private copy constructor:
SqlParameter(SqlParameter original) {
this.jsonSerializable = new JsonSerializable(
original.jsonSerializable.getPropertyBag().deepCopy());
this.rawValue = original.rawValue;
}This is a prerequisite for any fix to the applySerializer scalar issue — even after serialize() is made to handle scalars, the clone still needs to preserve the original Java type.
| if (this.rawValue != null && serializer != null) { | ||
| try { | ||
| this.jsonSerializable.set("value", this.rawValue, serializer, true); | ||
| } catch (Throwable t) { |
There was a problem hiding this comment.
🟡 Recommendation — Error Handling: catch(Throwable) catches fatal JVM errors; log level too low
Two issues with this error handling:
-
ThrowablecatchesErrorsubclasses —OutOfMemoryError,StackOverflowError,ThreadDeath, andVirtualMachineErrorare fatal JVM conditions that should never be silently swallowed. The intended use case (Spark connector stubs throwingUnsupportedOperationException) is anException, not anError. -
LOGGER.debug()masks incorrect query results — When the serializer fails and falls back to default serialization, the parameter value may have a different wire representation than the document field (e.g., epoch number vs ISO string). The query silently returns zero results. At DEBUG level, this is invisible in production. The log message says "does not support serialize()" but the actual failure mode (scalar-to-Map conversion failure) is different and much harder to diagnose.
// Suggested fix:
} catch (Exception e) {
LOGGER.warn(
"Custom serializer '{}' failed to serialize SqlParameter value of type '{}'; "
+ "falling back to default serialization. Query results may not match if the "
+ "custom serializer changes the wire format for this type.",
serializer.getClass().getSimpleName(),
this.rawValue.getClass().getSimpleName(),
e);
}|
✅ Review complete (50:22) Posted 3 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Previously caught Throwable, silently swallowing all serialization errors and falling back to default serialization. This masked real bugs where the custom serializer's output didn't match the document's stored format, causing queries to silently return wrong results. Now only catches: - UnsupportedOperationException (Java standard for 'not implemented') - scala.NotImplementedError (Scala's ??? operator, used by Spark connector) All other exceptions propagate so serializer bugs are surfaced immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When cloning SqlParameters for custom serializer application, the previous code used p.getValue(Object.class) which reads from the JSON property bag. This loses the original Java type (e.g. Instant becomes Long) because the JSON property bag stores the default-serialized representation. Now uses getRawValue() to preserve the original Java type, ensuring that applySerializer() re-serializes the correct type with the custom serializer. Changes: - SqlParameter: add package-private getRawValue() accessor - SqlQuerySpec: add cloneSqlParameter() to bridge accessor - ImplementationBridgeHelpers: add cloneSqlParameter to SqlQuerySpecAccessor - PipelinedQueryExecutionContextBase: use cloneSqlParameter() for cloning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Fixes #45521
Two bugs with
CustomItemSerializer:Bug 1 Deserialization failures in ORDER BY/GROUP BY/aggregate/DISTINCT queries
When
customItemSerializeris configured onCosmosClientBuilder, complex queries (ORDER BY, GROUP BY, VALUE, COUNT, SUM, DISTINCT, HybridSearch) fail because the custom serializer leaks into the internal query pipeline and is used to deserialize internal SDK structures (OrderByRowResult,Document, etc.).Root cause:
PipelinedDocumentQueryExecutionContext.createBaseComponentFunction()calledsetCustomItemSerializer(null)on cloned request options, butRxDocumentClientImpl.getEffectiveItemSerializer()falls back to the client-leveldefaultCustomSerializerwhen the request-level serializer isnull.Fix: Set
CosmosItemSerializer.DEFAULT_SERIALIZERinstead ofnullon internal request options. This causesgetEffectiveItemSerializer()to returnDEFAULT_SERIALIZERimmediately without falling through to the client-level custom serializer.Bug 2 SqlParameter ignores customItemSerializer
SqlParameter.setValue()delegates toJsonSerializable.set()which hardcodesINTERNAL_DEFAULT_SERIALIZER. Custom serialization settings (e.g., dates as ISO strings) are never applied to query parameters.Fix: Added
rawValuefield toSqlParameterto retain the original value, plusapplySerializer()to re-serialize with a custom serializer. Called fromPipelinedQueryExecutionContextBase.createAsync()where the effective serializer is known.Changes
PipelinedDocumentQueryExecutionContext.javanullDEFAULT_SERIALIZERin 4 internal pipeline pathsPipelinedQueryExecutionContextBase.javaSqlParameter.javarawValuefield +applySerializer()methodSqlQuerySpec.javaapplySerializerToParameters()Testing
Build compiles cleanly. Existing tests unaffected. Additional test cases for ORDER BY/GROUP BY/aggregate queries with custom serializer and SqlParameter serialization should be added.